← Início

Overview

Brought to you by YData

Dataset statistics

Number of variables27
Number of observations22
Missing cells66
Missing cells (%)11.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory4.8 KiB
Average record size in memory222.0 B

Variable types

Numeric3
Categorical20
DateTime1
Text2
Boolean1

Alerts

Able to ViablyPassage in nude mice is highly overall correlated with Age atDiagnosis and 15 other fieldsHigh correlation
Age atDiagnosis is highly overall correlated with Able to ViablyPassage in nude mice and 16 other fieldsHigh correlation
Age atSampling is highly overall correlated with Able to ViablyPassage in nude mice and 16 other fieldsHigh correlation
BiologicalSex is highly overall correlated with Age atDiagnosis and 11 other fieldsHigh correlation
BiopsySite is highly overall correlated with Able to ViablyPassage in nude mice and 20 other fieldsHigh correlation
CollectionDate is highly overall correlated with Able to ViablyPassage in nude mice and 20 other fieldsHigh correlation
Date ofDiagnosis is highly overall correlated with Able to ViablyPassage in nude mice and 20 other fieldsHigh correlation
Grade StageInformation is highly overall correlated with Able to ViablyPassage in nude mice and 14 other fieldsHigh correlation
Has KnownMetastaticDisease is highly overall correlated with Able to ViablyPassage in nude mice and 10 other fieldsHigh correlation
Has Smoked100 Cigarettes is highly overall correlated with Age atDiagnosis and 13 other fieldsHigh correlation
Human PathogenTesting Summary is highly overall correlated with Age atDiagnosis and 13 other fieldsHigh correlation
MSI Status is highly overall correlated with Able to ViablyPassage in nude mice and 13 other fieldsHigh correlation
ModelNotes is highly overall correlated with Able to ViablyPassage in nude mice and 20 other fieldsHigh correlation
Molecular andIHC Data is highly overall correlated with Able to ViablyPassage in nude mice and 21 other fieldsHigh correlation
Occupation is highly overall correlated with Able to ViablyPassage in nude mice and 20 other fieldsHigh correlation
PDX GrowthCurve Avail is highly overall correlated with Able to ViablyPassage in nude mice and 13 other fieldsHigh correlation
Patient ID is highly overall correlated with Able to ViablyPassage in nude mice and 18 other fieldsHigh correlation
PatientNotes is highly overall correlated with Able to ViablyPassage in nude mice and 20 other fieldsHigh correlation
ProvidedTissue Origin is highly overall correlated with BiopsySite and 10 other fieldsHigh correlation
Self-ReportedEthnicity is highly overall correlated with Age atDiagnosis and 13 other fieldsHigh correlation
Self-ReportedRace is highly overall correlated with Able to ViablyPassage in nude mice and 13 other fieldsHigh correlation
Specimen ID is highly overall correlated with Able to ViablyPassage in nude mice and 20 other fieldsHigh correlation
StandardizedRegimen is highly overall correlated with BiologicalSex and 3 other fieldsHigh correlation
Timing is highly overall correlated with StandardizedRegimenHigh correlation
ProvidedTissue Origin is highly imbalanced (56.1%) Imbalance
Date RegimenStarted has 11 (50.0%) missing values Missing
Best Response has 15 (68.2%) missing values Missing
AdditionalMedicalHistory has 8 (36.4%) missing values Missing
Molecular andIHC Data has 11 (50.0%) missing values Missing
Occupation has 3 (13.6%) missing values Missing
PatientNotes has 7 (31.8%) missing values Missing
ModelNotes has 11 (50.0%) missing values Missing
Date ofDiagnosis is uniformly distributed Uniform
Occupation is uniformly distributed Uniform
Specimen ID is uniformly distributed Uniform
CollectionDate is uniformly distributed Uniform

Reproduction

Analysis started2025-07-15 01:49:53.056723
Analysis finished2025-07-15 01:49:57.868900
Duration4.81 seconds
Software versionydata-profiling vv4.16.1
Download configurationconfig.json

Variables

Patient ID
Real number (ℝ)

High correlation 

Distinct10
Distinct (%)45.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean599045.59
Minimum111316
Maximum949853
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size308.0 B
2025-07-15T01:49:57.953611image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum111316
5-th percentile118081.8
Q1376338.5
median636974
Q3814656
95-th percentile947329.1
Maximum949853
Range838537
Interquartile range (IQR)438317.5

Descriptive statistics

Standard deviation269572.53
Coefficient of variation (CV)0.45000336
Kurtosis-1.0015044
Mean599045.59
Median Absolute Deviation (MAD)207207
Skewness-0.47649441
Sum13179003
Variance7.2669347 × 1010
MonotonicityIncreasing
2025-07-15T01:49:58.029798image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
738633 3
13.6%
814656 3
13.6%
246632 2
9.1%
111316 2
9.1%
358529 2
9.1%
429767 2
9.1%
636974 2
9.1%
627122 2
9.1%
899375 2
9.1%
949853 2
9.1%
ValueCountFrequency (%)
111316 2
9.1%
246632 2
9.1%
358529 2
9.1%
429767 2
9.1%
627122 2
9.1%
636974 2
9.1%
738633 3
13.6%
814656 3
13.6%
899375 2
9.1%
949853 2
9.1%
ValueCountFrequency (%)
949853 2
9.1%
899375 2
9.1%
814656 3
13.6%
738633 3
13.6%
636974 2
9.1%
627122 2
9.1%
429767 2
9.1%
358529 2
9.1%
246632 2
9.1%
111316 2
9.1%

Timing
Categorical

High correlation 

Distinct2
Distinct (%)9.1%
Missing0
Missing (%)0.0%
Memory size308.0 B
Prior
12 
Current
10 

Length

Max length7
Median length5
Mean length5.9090909
Min length5

Characters and Unicode

Total characters130
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCurrent
2nd rowPrior
3rd rowPrior
4th rowCurrent
5th rowPrior

Common Values

ValueCountFrequency (%)
Prior 12
54.5%
Current 10
45.5%

Length

2025-07-15T01:49:58.144066image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T01:49:58.233212image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
prior 12
54.5%
current 10
45.5%

Most occurring characters

ValueCountFrequency (%)
r 44
33.8%
P 12
 
9.2%
i 12
 
9.2%
o 12
 
9.2%
C 10
 
7.7%
u 10
 
7.7%
e 10
 
7.7%
n 10
 
7.7%
t 10
 
7.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 130
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
r 44
33.8%
P 12
 
9.2%
i 12
 
9.2%
o 12
 
9.2%
C 10
 
7.7%
u 10
 
7.7%
e 10
 
7.7%
n 10
 
7.7%
t 10
 
7.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 130
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
r 44
33.8%
P 12
 
9.2%
i 12
 
9.2%
o 12
 
9.2%
C 10
 
7.7%
u 10
 
7.7%
e 10
 
7.7%
n 10
 
7.7%
t 10
 
7.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 130
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
r 44
33.8%
P 12
 
9.2%
i 12
 
9.2%
o 12
 
9.2%
C 10
 
7.7%
u 10
 
7.7%
e 10
 
7.7%
n 10
 
7.7%
t 10
 
7.7%

Date RegimenStarted
Date

Missing 

Distinct11
Distinct (%)100.0%
Missing11
Missing (%)50.0%
Memory size308.0 B
Minimum2007-05-01 00:00:00
Maximum2021-08-01 00:00:00
Invalid dates0
Invalid dates (%)0.0%
2025-07-15T01:49:58.301312image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-15T01:49:58.389642image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=11)

StandardizedRegimen
Categorical

High correlation 

Distinct6
Distinct (%)27.3%
Missing0
Missing (%)0.0%
Memory size308.0 B
Imatinib Mesylate
Treatment naive
No Current Therapy
Ripretinib
Sunitinib Malate

Length

Max length18
Median length17
Mean length15.954545
Min length9

Characters and Unicode

Total characters351
Distinct characters25
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)13.6%

Sample

1st rowRipretinib
2nd rowImatinib Mesylate
3rd rowTreatment naive
4th rowTreatment naive
5th rowImatinib Mesylate

Common Values

ValueCountFrequency (%)
Imatinib Mesylate 8
36.4%
Treatment naive 6
27.3%
No Current Therapy 5
22.7%
Ripretinib 1
 
4.5%
Sunitinib Malate 1
 
4.5%
Radiation 1
 
4.5%

Length

2025-07-15T01:49:58.485726image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T01:49:58.565606image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
imatinib 8
17.0%
mesylate 8
17.0%
treatment 6
12.8%
naive 6
12.8%
no 5
10.6%
current 5
10.6%
therapy 5
10.6%
ripretinib 1
 
2.1%
sunitinib 1
 
2.1%
malate 1
 
2.1%

Most occurring characters

ValueCountFrequency (%)
e 46
13.1%
t 37
10.5%
a 37
10.5%
i 30
 
8.5%
n 29
 
8.3%
25
 
7.1%
r 22
 
6.3%
m 14
 
4.0%
y 13
 
3.7%
T 11
 
3.1%
Other values (15) 87
24.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 351
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 46
13.1%
t 37
10.5%
a 37
10.5%
i 30
 
8.5%
n 29
 
8.3%
25
 
7.1%
r 22
 
6.3%
m 14
 
4.0%
y 13
 
3.7%
T 11
 
3.1%
Other values (15) 87
24.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 351
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 46
13.1%
t 37
10.5%
a 37
10.5%
i 30
 
8.5%
n 29
 
8.3%
25
 
7.1%
r 22
 
6.3%
m 14
 
4.0%
y 13
 
3.7%
T 11
 
3.1%
Other values (15) 87
24.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 351
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 46
13.1%
t 37
10.5%
a 37
10.5%
i 30
 
8.5%
n 29
 
8.3%
25
 
7.1%
r 22
 
6.3%
m 14
 
4.0%
y 13
 
3.7%
T 11
 
3.1%
Other values (15) 87
24.8%

Best Response
Text

Missing 

Distinct5
Distinct (%)71.4%
Missing15
Missing (%)68.2%
Memory size308.0 B
2025-07-15T01:49:58.730491image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

Max length14
Median length13
Mean length8
Min length2

Characters and Unicode

Total characters56
Distinct characters24
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)42.9%

Sample

1st rowPR
2nd rowStable Disease
3rd rowStable Disease
4th row<Unknown>
5th rowNon-evaluable
ValueCountFrequency (%)
pr 2
22.2%
stable 2
22.2%
disease 2
22.2%
unknown 1
11.1%
non-evaluable 1
11.1%
cr 1
11.1%
2025-07-15T01:49:58.973863image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 8
14.3%
a 6
 
10.7%
l 4
 
7.1%
s 4
 
7.1%
n 4
 
7.1%
R 3
 
5.4%
b 3
 
5.4%
t 2
 
3.6%
S 2
 
3.6%
P 2
 
3.6%
Other values (14) 18
32.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 56
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 8
14.3%
a 6
 
10.7%
l 4
 
7.1%
s 4
 
7.1%
n 4
 
7.1%
R 3
 
5.4%
b 3
 
5.4%
t 2
 
3.6%
S 2
 
3.6%
P 2
 
3.6%
Other values (14) 18
32.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 56
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 8
14.3%
a 6
 
10.7%
l 4
 
7.1%
s 4
 
7.1%
n 4
 
7.1%
R 3
 
5.4%
b 3
 
5.4%
t 2
 
3.6%
S 2
 
3.6%
P 2
 
3.6%
Other values (14) 18
32.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 56
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 8
14.3%
a 6
 
10.7%
l 4
 
7.1%
s 4
 
7.1%
n 4
 
7.1%
R 3
 
5.4%
b 3
 
5.4%
t 2
 
3.6%
S 2
 
3.6%
P 2
 
3.6%
Other values (14) 18
32.1%

Self-ReportedRace
Categorical

High correlation 

Distinct3
Distinct (%)13.6%
Missing0
Missing (%)0.0%
Memory size308.0 B
White
14 
Black or African American
Not Provided

Length

Max length25
Median length5
Mean length10.5
Min length5

Characters and Unicode

Total characters231
Distinct characters21
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowWhite
2nd rowWhite
3rd rowWhite
4th rowWhite
5th rowWhite

Common Values

ValueCountFrequency (%)
White 14
63.6%
Black or African American 5
 
22.7%
Not Provided 3
 
13.6%

Length

2025-07-15T01:49:59.101675image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T01:49:59.191126image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
white 14
35.0%
black 5
 
12.5%
or 5
 
12.5%
african 5
 
12.5%
american 5
 
12.5%
not 3
 
7.5%
provided 3
 
7.5%

Most occurring characters

ValueCountFrequency (%)
i 27
11.7%
e 22
 
9.5%
r 18
 
7.8%
18
 
7.8%
t 17
 
7.4%
c 15
 
6.5%
a 15
 
6.5%
h 14
 
6.1%
W 14
 
6.1%
o 11
 
4.8%
Other values (11) 60
26.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 231
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
i 27
11.7%
e 22
 
9.5%
r 18
 
7.8%
18
 
7.8%
t 17
 
7.4%
c 15
 
6.5%
a 15
 
6.5%
h 14
 
6.1%
W 14
 
6.1%
o 11
 
4.8%
Other values (11) 60
26.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 231
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
i 27
11.7%
e 22
 
9.5%
r 18
 
7.8%
18
 
7.8%
t 17
 
7.4%
c 15
 
6.5%
a 15
 
6.5%
h 14
 
6.1%
W 14
 
6.1%
o 11
 
4.8%
Other values (11) 60
26.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 231
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
i 27
11.7%
e 22
 
9.5%
r 18
 
7.8%
18
 
7.8%
t 17
 
7.4%
c 15
 
6.5%
a 15
 
6.5%
h 14
 
6.1%
W 14
 
6.1%
o 11
 
4.8%
Other values (11) 60
26.0%

Self-ReportedEthnicity
Categorical

High correlation 

Distinct2
Distinct (%)9.1%
Missing0
Missing (%)0.0%
Memory size308.0 B
Not Hispanic or Latino
19 
Not Provided

Length

Max length22
Median length22
Mean length20.636364
Min length12

Characters and Unicode

Total characters454
Distinct characters17
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNot Hispanic or Latino
2nd rowNot Hispanic or Latino
3rd rowNot Hispanic or Latino
4th rowNot Hispanic or Latino
5th rowNot Hispanic or Latino

Common Values

ValueCountFrequency (%)
Not Hispanic or Latino 19
86.4%
Not Provided 3
 
13.6%

Length

2025-07-15T01:49:59.285813image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T01:49:59.351725image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
not 22
26.8%
hispanic 19
23.2%
or 19
23.2%
latino 19
23.2%
provided 3
 
3.7%

Most occurring characters

ValueCountFrequency (%)
o 63
13.9%
i 60
13.2%
60
13.2%
t 41
9.0%
a 38
8.4%
n 38
8.4%
N 22
 
4.8%
r 22
 
4.8%
H 19
 
4.2%
p 19
 
4.2%
Other values (7) 72
15.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 454
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
o 63
13.9%
i 60
13.2%
60
13.2%
t 41
9.0%
a 38
8.4%
n 38
8.4%
N 22
 
4.8%
r 22
 
4.8%
H 19
 
4.2%
p 19
 
4.2%
Other values (7) 72
15.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 454
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
o 63
13.9%
i 60
13.2%
60
13.2%
t 41
9.0%
a 38
8.4%
n 38
8.4%
N 22
 
4.8%
r 22
 
4.8%
H 19
 
4.2%
p 19
 
4.2%
Other values (7) 72
15.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 454
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
o 63
13.9%
i 60
13.2%
60
13.2%
t 41
9.0%
a 38
8.4%
n 38
8.4%
N 22
 
4.8%
r 22
 
4.8%
H 19
 
4.2%
p 19
 
4.2%
Other values (7) 72
15.9%

BiologicalSex
Categorical

High correlation 

Distinct2
Distinct (%)9.1%
Missing0
Missing (%)0.0%
Memory size308.0 B
Male
12 
Female
10 

Length

Max length6
Median length4
Mean length4.9090909
Min length4

Characters and Unicode

Total characters108
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale
2nd rowMale
3rd rowMale
4th rowMale
5th rowFemale

Common Values

ValueCountFrequency (%)
Male 12
54.5%
Female 10
45.5%

Length

2025-07-15T01:49:59.438792image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T01:49:59.508861image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
male 12
54.5%
female 10
45.5%

Most occurring characters

ValueCountFrequency (%)
e 32
29.6%
a 22
20.4%
l 22
20.4%
M 12
 
11.1%
F 10
 
9.3%
m 10
 
9.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 108
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 32
29.6%
a 22
20.4%
l 22
20.4%
M 12
 
11.1%
F 10
 
9.3%
m 10
 
9.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 108
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 32
29.6%
a 22
20.4%
l 22
20.4%
M 12
 
11.1%
F 10
 
9.3%
m 10
 
9.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 108
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 32
29.6%
a 22
20.4%
l 22
20.4%
M 12
 
11.1%
F 10
 
9.3%
m 10
 
9.3%

Age atDiagnosis
Real number (ℝ)

High correlation 

Distinct9
Distinct (%)40.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean50.272727
Minimum31
Maximum80
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size308.0 B
2025-07-15T01:49:59.574031image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum31
5-th percentile31
Q139
median52
Q357.75
95-th percentile79.3
Maximum80
Range49
Interquartile range (IQR)18.75

Descriptive statistics

Standard deviation14.836191
Coefficient of variation (CV)0.2951141
Kurtosis-0.49114955
Mean50.272727
Median Absolute Deviation (MAD)13
Skewness0.44699848
Sum1106
Variance220.11255
MonotonicityNot monotonic
2025-07-15T01:49:59.663333image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
39 4
18.2%
57 3
13.6%
31 3
13.6%
58 2
9.1%
35 2
9.1%
53 2
9.1%
51 2
9.1%
66 2
9.1%
80 2
9.1%
ValueCountFrequency (%)
31 3
13.6%
35 2
9.1%
39 4
18.2%
51 2
9.1%
53 2
9.1%
57 3
13.6%
58 2
9.1%
66 2
9.1%
80 2
9.1%
ValueCountFrequency (%)
80 2
9.1%
66 2
9.1%
58 2
9.1%
57 3
13.6%
53 2
9.1%
51 2
9.1%
39 4
18.2%
35 2
9.1%
31 3
13.6%
Distinct7
Distinct (%)50.0%
Missing8
Missing (%)36.4%
Memory size308.0 B
2025-07-15T01:49:59.888748image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

Max length155
Median length86
Mean length85.285714
Min length33

Characters and Unicode

Total characters1194
Distinct characters49
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFinal Pathology: Dx confirmed; mitotic rate is 47/30 HPF; treatment effect present w/central necrosis.
2nd rowFinal Pathology: Dx confirmed; mitotic rate is 47/30 HPF; treatment effect present w/central necrosis.
3rd rowConcurrent malignancy - SDHB gene mutation associated paraganglioma
4th rowConcurrent malignancy - SDHB gene mutation associated paraganglioma
5th rowFinal Pathology: Dx confirmed with treatment effect present; 50% viable tumor remains
ValueCountFrequency (%)
final 6
 
3.9%
pathology 6
 
3.9%
dx 6
 
3.9%
confirmed 6
 
3.9%
mitotic 4
 
2.6%
treatment 4
 
2.6%
effect 4
 
2.6%
malignancy 4
 
2.6%
4
 
2.6%
tumor 4
 
2.6%
Other values (49) 104
68.4%
2025-07-15T01:50:00.248497image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
144
 
12.1%
e 98
 
8.2%
t 96
 
8.0%
a 92
 
7.7%
r 78
 
6.5%
i 76
 
6.4%
n 74
 
6.2%
o 74
 
6.2%
l 52
 
4.4%
c 50
 
4.2%
Other values (39) 360
30.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1194
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
144
 
12.1%
e 98
 
8.2%
t 96
 
8.0%
a 92
 
7.7%
r 78
 
6.5%
i 76
 
6.4%
n 74
 
6.2%
o 74
 
6.2%
l 52
 
4.4%
c 50
 
4.2%
Other values (39) 360
30.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1194
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
144
 
12.1%
e 98
 
8.2%
t 96
 
8.0%
a 92
 
7.7%
r 78
 
6.5%
i 76
 
6.4%
n 74
 
6.2%
o 74
 
6.2%
l 52
 
4.4%
c 50
 
4.2%
Other values (39) 360
30.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1194
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
144
 
12.1%
e 98
 
8.2%
t 96
 
8.0%
a 92
 
7.7%
r 78
 
6.5%
i 76
 
6.4%
n 74
 
6.2%
o 74
 
6.2%
l 52
 
4.4%
c 50
 
4.2%
Other values (39) 360
30.2%

Date ofDiagnosis
Categorical

High correlation  Uniform 

Distinct10
Distinct (%)45.5%
Missing0
Missing (%)0.0%
Memory size308.0 B
03/2014
07/2009
11/2017
07/2020
12/2017
Other values (5)
10 

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters154
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row07/2020
2nd row07/2020
3rd row11/2017
4th row11/2017
5th row12/2017

Common Values

ValueCountFrequency (%)
03/2014 3
13.6%
07/2009 3
13.6%
11/2017 2
9.1%
07/2020 2
9.1%
12/2017 2
9.1%
01/2007 2
9.1%
10/2013 2
9.1%
02/2017 2
9.1%
06/2017 2
9.1%
10/2015 2
9.1%

Length

2025-07-15T01:50:00.358089image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T01:50:00.455337image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
03/2014 3
13.6%
07/2009 3
13.6%
11/2017 2
9.1%
07/2020 2
9.1%
12/2017 2
9.1%
01/2007 2
9.1%
10/2013 2
9.1%
02/2017 2
9.1%
06/2017 2
9.1%
10/2015 2
9.1%

Most occurring characters

ValueCountFrequency (%)
0 47
30.5%
2 28
18.2%
1 27
17.5%
/ 22
14.3%
7 15
 
9.7%
3 5
 
3.2%
4 3
 
1.9%
9 3
 
1.9%
6 2
 
1.3%
5 2
 
1.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 154
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 47
30.5%
2 28
18.2%
1 27
17.5%
/ 22
14.3%
7 15
 
9.7%
3 5
 
3.2%
4 3
 
1.9%
9 3
 
1.9%
6 2
 
1.3%
5 2
 
1.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 154
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 47
30.5%
2 28
18.2%
1 27
17.5%
/ 22
14.3%
7 15
 
9.7%
3 5
 
3.2%
4 3
 
1.9%
9 3
 
1.9%
6 2
 
1.3%
5 2
 
1.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 154
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 47
30.5%
2 28
18.2%
1 27
17.5%
/ 22
14.3%
7 15
 
9.7%
3 5
 
3.2%
4 3
 
1.9%
9 3
 
1.9%
6 2
 
1.3%
5 2
 
1.3%

Grade StageInformation
Categorical

High correlation 

Distinct5
Distinct (%)22.7%
Missing0
Missing (%)0.0%
Memory size308.0 B
None Provided
11 
Grade, TNM
Grade, TNM (Clinical)
Grade, TNM (Pathological)
TNM (Pathological)

Length

Max length25
Median length23
Mean length14.590909
Min length10

Characters and Unicode

Total characters321
Distinct characters23
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowGrade, TNM (Clinical)
2nd rowGrade, TNM (Clinical)
3rd rowNone Provided
4th rowNone Provided
5th rowGrade, TNM (Pathological)

Common Values

ValueCountFrequency (%)
None Provided 11
50.0%
Grade, TNM 5
22.7%
Grade, TNM (Clinical) 2
 
9.1%
Grade, TNM (Pathological) 2
 
9.1%
TNM (Pathological) 2
 
9.1%

Length

2025-07-15T01:50:00.802729image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T01:50:00.882860image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
none 11
22.9%
provided 11
22.9%
tnm 11
22.9%
grade 9
18.8%
pathological 4
 
8.3%
clinical 2
 
4.2%

Most occurring characters

ValueCountFrequency (%)
e 31
 
9.7%
d 31
 
9.7%
o 30
 
9.3%
26
 
8.1%
N 22
 
6.9%
r 20
 
6.2%
i 19
 
5.9%
a 19
 
5.9%
P 15
 
4.7%
n 13
 
4.0%
Other values (13) 95
29.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 321
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 31
 
9.7%
d 31
 
9.7%
o 30
 
9.3%
26
 
8.1%
N 22
 
6.9%
r 20
 
6.2%
i 19
 
5.9%
a 19
 
5.9%
P 15
 
4.7%
n 13
 
4.0%
Other values (13) 95
29.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 321
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 31
 
9.7%
d 31
 
9.7%
o 30
 
9.3%
26
 
8.1%
N 22
 
6.9%
r 20
 
6.2%
i 19
 
5.9%
a 19
 
5.9%
P 15
 
4.7%
n 13
 
4.0%
Other values (13) 95
29.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 321
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 31
 
9.7%
d 31
 
9.7%
o 30
 
9.3%
26
 
8.1%
N 22
 
6.9%
r 20
 
6.2%
i 19
 
5.9%
a 19
 
5.9%
P 15
 
4.7%
n 13
 
4.0%
Other values (13) 95
29.6%

Has KnownMetastaticDisease
Categorical

High correlation 

Distinct2
Distinct (%)9.1%
Missing0
Missing (%)0.0%
Memory size308.0 B
Not Reported
14 
Yes

Length

Max length12
Median length12
Mean length8.7272727
Min length3

Characters and Unicode

Total characters192
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowYes
2nd rowYes
3rd rowNot Reported
4th rowNot Reported
5th rowNot Reported

Common Values

ValueCountFrequency (%)
Not Reported 14
63.6%
Yes 8
36.4%

Length

2025-07-15T01:50:00.997776image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T01:50:01.059243image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
not 14
38.9%
reported 14
38.9%
yes 8
22.2%

Most occurring characters

ValueCountFrequency (%)
e 36
18.8%
t 28
14.6%
o 28
14.6%
N 14
 
7.3%
14
 
7.3%
R 14
 
7.3%
p 14
 
7.3%
r 14
 
7.3%
d 14
 
7.3%
Y 8
 
4.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 192
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 36
18.8%
t 28
14.6%
o 28
14.6%
N 14
 
7.3%
14
 
7.3%
R 14
 
7.3%
p 14
 
7.3%
r 14
 
7.3%
d 14
 
7.3%
Y 8
 
4.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 192
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 36
18.8%
t 28
14.6%
o 28
14.6%
N 14
 
7.3%
14
 
7.3%
R 14
 
7.3%
p 14
 
7.3%
r 14
 
7.3%
d 14
 
7.3%
Y 8
 
4.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 192
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 36
18.8%
t 28
14.6%
o 28
14.6%
N 14
 
7.3%
14
 
7.3%
R 14
 
7.3%
p 14
 
7.3%
r 14
 
7.3%
d 14
 
7.3%
Y 8
 
4.2%

Has Smoked100 Cigarettes
Categorical

High correlation 

Distinct3
Distinct (%)13.6%
Missing0
Missing (%)0.0%
Memory size308.0 B
No
11 
Yes
Not Provided

Length

Max length12
Median length7.5
Mean length3.7272727
Min length2

Characters and Unicode

Total characters82
Distinct characters12
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo
2nd rowNo
3rd rowNo
4th rowNo
5th rowYes

Common Values

ValueCountFrequency (%)
No 11
50.0%
Yes 8
36.4%
Not Provided 3
 
13.6%

Length

2025-07-15T01:50:01.142427image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T01:50:01.226660image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
no 11
44.0%
yes 8
32.0%
not 3
 
12.0%
provided 3
 
12.0%

Most occurring characters

ValueCountFrequency (%)
o 17
20.7%
N 14
17.1%
e 11
13.4%
Y 8
9.8%
s 8
9.8%
d 6
 
7.3%
3
 
3.7%
t 3
 
3.7%
P 3
 
3.7%
r 3
 
3.7%
Other values (2) 6
 
7.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 82
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
o 17
20.7%
N 14
17.1%
e 11
13.4%
Y 8
9.8%
s 8
9.8%
d 6
 
7.3%
3
 
3.7%
t 3
 
3.7%
P 3
 
3.7%
r 3
 
3.7%
Other values (2) 6
 
7.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 82
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
o 17
20.7%
N 14
17.1%
e 11
13.4%
Y 8
9.8%
s 8
9.8%
d 6
 
7.3%
3
 
3.7%
t 3
 
3.7%
P 3
 
3.7%
r 3
 
3.7%
Other values (2) 6
 
7.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 82
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
o 17
20.7%
N 14
17.1%
e 11
13.4%
Y 8
9.8%
s 8
9.8%
d 6
 
7.3%
3
 
3.7%
t 3
 
3.7%
P 3
 
3.7%
r 3
 
3.7%
Other values (2) 6
 
7.3%

Molecular andIHC Data
Categorical

High correlation  Missing 

Distinct5
Distinct (%)45.5%
Missing11
Missing (%)50.0%
Memory size308.0 B
Biomarkers: c-KIT mutated, DOG1+
c.445C>T (p.Q149*) variant in the SDHB gene
IHC: DOG1+, CD117+; S100-, CK AE1/AE3-
IHC (from 10/2013 primary diagnosis): CKIT+, CD34+, BCL2+, CDX2 -, CK7 -, CK20 -, Melan-A -, S-100 -, Vimentin -. PDGFRA exon 12 &18 mutation negative.
IHC: DOG1+, CD117+, and PDGFR+ consistent with a gastrointestinal stromal tumor (GIST).

Length

Max length152
Median length43
Mean length67.363636
Min length33

Characters and Unicode

Total characters741
Distinct characters64
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowIHC: DOG1+, CD117+; S100-, CK AE1/AE3-
2nd rowIHC: DOG1+, CD117+; S100-, CK AE1/AE3-
3rd rowc.445C>T (p.Q149*) variant in the SDHB gene
4th rowc.445C>T (p.Q149*) variant in the SDHB gene
5th rowIHC (from 10/2013 primary diagnosis): CKIT+, CD34+, BCL2+, CDX2 -, CK7 -, CK20 -, Melan-A -, S-100 -, Vimentin -. PDGFRA exon 12 &18 mutation negative.

Common Values

ValueCountFrequency (%)
Biomarkers: c-KIT mutated, DOG1+ 3
 
13.6%
c.445C>T (p.Q149*) variant in the SDHB gene 2
 
9.1%
IHC: DOG1+, CD117+; S100-, CK AE1/AE3- 2
 
9.1%
IHC (from 10/2013 primary diagnosis): CKIT+, CD34+, BCL2+, CDX2 -, CK7 -, CK20 -, Melan-A -, S-100 -, Vimentin -. PDGFRA exon 12 &18 mutation negative. 2
 
9.1%
IHC: DOG1+, CD117+, and PDGFR+ consistent with a gastrointestinal stromal tumor (GIST). 2
 
9.1%
(Missing) 11
50.0%

Length

2025-07-15T01:50:01.318556image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T01:50:01.406970image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
12
 
10.5%
dog1 7
 
6.1%
ihc 6
 
5.3%
cd117 4
 
3.5%
biomarkers 3
 
2.6%
mutated 3
 
2.6%
c-kit 3
 
2.6%
p.q149 2
 
1.8%
c.445c>t 2
 
1.8%
the 2
 
1.8%
Other values (35) 70
61.4%

Most occurring characters

ValueCountFrequency (%)
110
 
14.8%
t 34
 
4.6%
1 31
 
4.2%
a 30
 
4.0%
n 30
 
4.0%
i 29
 
3.9%
, 27
 
3.6%
C 26
 
3.5%
e 26
 
3.5%
- 23
 
3.1%
Other values (54) 375
50.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 741
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
110
 
14.8%
t 34
 
4.6%
1 31
 
4.2%
a 30
 
4.0%
n 30
 
4.0%
i 29
 
3.9%
, 27
 
3.6%
C 26
 
3.5%
e 26
 
3.5%
- 23
 
3.1%
Other values (54) 375
50.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 741
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
110
 
14.8%
t 34
 
4.6%
1 31
 
4.2%
a 30
 
4.0%
n 30
 
4.0%
i 29
 
3.9%
, 27
 
3.6%
C 26
 
3.5%
e 26
 
3.5%
- 23
 
3.1%
Other values (54) 375
50.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 741
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
110
 
14.8%
t 34
 
4.6%
1 31
 
4.2%
a 30
 
4.0%
n 30
 
4.0%
i 29
 
3.9%
, 27
 
3.6%
C 26
 
3.5%
e 26
 
3.5%
- 23
 
3.1%
Other values (54) 375
50.6%

Occupation
Categorical

High correlation  Missing  Uniform 

Distinct9
Distinct (%)47.4%
Missing3
Missing (%)13.6%
Memory size308.0 B
Not Provided
insurance industry
Engineer
Department Manger
Disabled
Other values (4)

Length

Max length39
Median length17
Mean length15.947368
Min length7

Characters and Unicode

Total characters303
Distinct characters31
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowEngineer
2nd rowEngineer
3rd rowinsurance industry
4th rowinsurance industry
5th rowDepartment Manger

Common Values

ValueCountFrequency (%)
Not Provided 3
13.6%
insurance industry 2
9.1%
Engineer 2
9.1%
Department Manger 2
9.1%
Disabled 2
9.1%
Unknown 2
9.1%
Park Service 2
9.1%
Retired; prior occupation not provided 2
9.1%
Plastics factory worker 2
9.1%
(Missing) 3
13.6%

Length

2025-07-15T01:50:01.545243image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T01:50:01.649510image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
not 5
 
12.5%
provided 5
 
12.5%
insurance 2
 
5.0%
industry 2
 
5.0%
engineer 2
 
5.0%
department 2
 
5.0%
manger 2
 
5.0%
disabled 2
 
5.0%
unknown 2
 
5.0%
park 2
 
5.0%
Other values (7) 14
35.0%

Most occurring characters

ValueCountFrequency (%)
r 31
 
10.2%
e 29
 
9.6%
26
 
8.6%
n 24
 
7.9%
i 23
 
7.6%
o 22
 
7.3%
t 19
 
6.3%
a 16
 
5.3%
d 16
 
5.3%
c 12
 
4.0%
Other values (21) 85
28.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 303
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
r 31
 
10.2%
e 29
 
9.6%
26
 
8.6%
n 24
 
7.9%
i 23
 
7.6%
o 22
 
7.3%
t 19
 
6.3%
a 16
 
5.3%
d 16
 
5.3%
c 12
 
4.0%
Other values (21) 85
28.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 303
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
r 31
 
10.2%
e 29
 
9.6%
26
 
8.6%
n 24
 
7.9%
i 23
 
7.6%
o 22
 
7.3%
t 19
 
6.3%
a 16
 
5.3%
d 16
 
5.3%
c 12
 
4.0%
Other values (21) 85
28.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 303
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
r 31
 
10.2%
e 29
 
9.6%
26
 
8.6%
n 24
 
7.9%
i 23
 
7.6%
o 22
 
7.3%
t 19
 
6.3%
a 16
 
5.3%
d 16
 
5.3%
c 12
 
4.0%
Other values (21) 85
28.1%

PatientNotes
Categorical

High correlation  Missing 

Distinct7
Distinct (%)46.7%
Missing7
Missing (%)31.8%
Memory size308.0 B
Disease recurrence at distant site Tumor Grade/Stage: Grade 2, T2N0M1
Pt has concurrent malignancies - GIST and SDHB gene mutation associated paraganglioma
Tumor Grade/Stage: cM0 (at diagnosis); high grade Location of known metastasis: liver, jejunum, ileum
Tumor Stage/Grade: G1 - 1 low grade; pT3 pN0 pM not applicable
Tumor Grade/Stage: High grade, T4NXM1
Other values (2)

Length

Max length102
Median length62
Mean length60.8
Min length26

Characters and Unicode

Total characters912
Distinct characters48
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTumor Grade/Stage: cM0 (at diagnosis); high grade Location of known metastasis: liver, jejunum, ileum
2nd rowTumor Grade/Stage: cM0 (at diagnosis); high grade Location of known metastasis: liver, jejunum, ileum
3rd rowPt has concurrent malignancies - GIST and SDHB gene mutation associated paraganglioma
4th rowPt has concurrent malignancies - GIST and SDHB gene mutation associated paraganglioma
5th rowTumor Stage/Grade: G1 - 1 low grade; pT3 pN0 pM not applicable

Common Values

ValueCountFrequency (%)
Disease recurrence at distant site Tumor Grade/Stage: Grade 2, T2N0M1 3
13.6%
Pt has concurrent malignancies - GIST and SDHB gene mutation associated paraganglioma 2
 
9.1%
Tumor Grade/Stage: cM0 (at diagnosis); high grade Location of known metastasis: liver, jejunum, ileum 2
 
9.1%
Tumor Stage/Grade: G1 - 1 low grade; pT3 pN0 pM not applicable 2
 
9.1%
Tumor Grade/Stage: High grade, T4NXM1 2
 
9.1%
Tumor Grade/Stage: pT2NX 2
 
9.1%
Location of know metastases: liver 2
 
9.1%
(Missing) 7
31.8%

Length

2025-07-15T01:50:01.795536image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T01:50:01.889156image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
tumor 11
 
8.3%
grade/stage 9
 
6.8%
grade 9
 
6.8%
at 5
 
3.8%
4
 
3.0%
high 4
 
3.0%
location 4
 
3.0%
of 4
 
3.0%
liver 4
 
3.0%
site 3
 
2.3%
Other values (35) 75
56.8%

Most occurring characters

ValueCountFrequency (%)
116
 
12.7%
a 82
 
9.0%
e 75
 
8.2%
r 50
 
5.5%
t 49
 
5.4%
i 41
 
4.5%
o 41
 
4.5%
n 38
 
4.2%
s 36
 
3.9%
g 31
 
3.4%
Other values (38) 353
38.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 912
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
116
 
12.7%
a 82
 
9.0%
e 75
 
8.2%
r 50
 
5.5%
t 49
 
5.4%
i 41
 
4.5%
o 41
 
4.5%
n 38
 
4.2%
s 36
 
3.9%
g 31
 
3.4%
Other values (38) 353
38.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 912
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
116
 
12.7%
a 82
 
9.0%
e 75
 
8.2%
r 50
 
5.5%
t 49
 
5.4%
i 41
 
4.5%
o 41
 
4.5%
n 38
 
4.2%
s 36
 
3.9%
g 31
 
3.4%
Other values (38) 353
38.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 912
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
116
 
12.7%
a 82
 
9.0%
e 75
 
8.2%
r 50
 
5.5%
t 49
 
5.4%
i 41
 
4.5%
o 41
 
4.5%
n 38
 
4.2%
s 36
 
3.9%
g 31
 
3.4%
Other values (38) 353
38.7%

Specimen ID
Categorical

High correlation  Uniform 

Distinct10
Distinct (%)45.5%
Missing0
Missing (%)0.0%
Memory size308.0 B
008-R
196-R
235-R1
319-R
109-R
Other values (5)
10 

Length

Max length6
Median length5
Mean length5.0909091
Min length5

Characters and Unicode

Total characters112
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row319-R
2nd row319-R
3rd row235-R1
4th row235-R1
5th row109-R

Common Values

ValueCountFrequency (%)
008-R 3
13.6%
196-R 3
13.6%
235-R1 2
9.1%
319-R 2
9.1%
109-R 2
9.1%
202-R 2
9.1%
082-R 2
9.1%
101-R 2
9.1%
194-R 2
9.1%
013-R 2
9.1%

Length

2025-07-15T01:50:02.045594image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T01:50:02.141485image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
008-r 3
13.6%
196-r 3
13.6%
235-r1 2
9.1%
319-r 2
9.1%
109-r 2
9.1%
202-r 2
9.1%
082-r 2
9.1%
101-r 2
9.1%
194-r 2
9.1%
013-r 2
9.1%

Most occurring characters

ValueCountFrequency (%)
- 22
19.6%
R 22
19.6%
1 17
15.2%
0 16
14.3%
9 9
8.0%
2 8
 
7.1%
3 6
 
5.4%
8 5
 
4.5%
6 3
 
2.7%
5 2
 
1.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 112
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
- 22
19.6%
R 22
19.6%
1 17
15.2%
0 16
14.3%
9 9
8.0%
2 8
 
7.1%
3 6
 
5.4%
8 5
 
4.5%
6 3
 
2.7%
5 2
 
1.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 112
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
- 22
19.6%
R 22
19.6%
1 17
15.2%
0 16
14.3%
9 9
8.0%
2 8
 
7.1%
3 6
 
5.4%
8 5
 
4.5%
6 3
 
2.7%
5 2
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 112
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
- 22
19.6%
R 22
19.6%
1 17
15.2%
0 16
14.3%
9 9
8.0%
2 8
 
7.1%
3 6
 
5.4%
8 5
 
4.5%
6 3
 
2.7%
5 2
 
1.8%

BiopsySite
Categorical

High correlation 

Distinct9
Distinct (%)40.9%
Missing0
Missing (%)0.0%
Memory size308.0 B
Stomach
Stomach [distal]
Stomach/Liver
Liver [central]
Gastric
Other values (4)

Length

Max length16
Median length14
Mean length11.863636
Min length7

Characters and Unicode

Total characters261
Distinct characters26
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLiver [central]
2nd rowLiver [central]
3rd rowGastric
4th rowGastric
5th rowStomach

Common Values

ValueCountFrequency (%)
Stomach 4
18.2%
Stomach [distal] 3
13.6%
Stomach/Liver 3
13.6%
Liver [central] 2
9.1%
Gastric 2
9.1%
Gastric Fundus 2
9.1%
Abdominal Mass 2
9.1%
abdominal mass 2
9.1%
Stomach 2
9.1%

Length

2025-07-15T01:50:02.295099image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T01:50:02.403101image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
stomach 9
27.3%
abdominal 4
12.1%
mass 4
12.1%
gastric 4
12.1%
distal 3
 
9.1%
stomach/liver 3
 
9.1%
liver 2
 
6.1%
central 2
 
6.1%
fundus 2
 
6.1%

Most occurring characters

ValueCountFrequency (%)
a 31
 
11.9%
t 21
 
8.0%
c 18
 
6.9%
m 18
 
6.9%
s 17
 
6.5%
o 16
 
6.1%
i 16
 
6.1%
15
 
5.7%
S 12
 
4.6%
h 12
 
4.6%
Other values (16) 85
32.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 261
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 31
 
11.9%
t 21
 
8.0%
c 18
 
6.9%
m 18
 
6.9%
s 17
 
6.5%
o 16
 
6.1%
i 16
 
6.1%
15
 
5.7%
S 12
 
4.6%
h 12
 
4.6%
Other values (16) 85
32.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 261
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 31
 
11.9%
t 21
 
8.0%
c 18
 
6.9%
m 18
 
6.9%
s 17
 
6.5%
o 16
 
6.1%
i 16
 
6.1%
15
 
5.7%
S 12
 
4.6%
h 12
 
4.6%
Other values (16) 85
32.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 261
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 31
 
11.9%
t 21
 
8.0%
c 18
 
6.9%
m 18
 
6.9%
s 17
 
6.5%
o 16
 
6.1%
i 16
 
6.1%
15
 
5.7%
S 12
 
4.6%
h 12
 
4.6%
Other values (16) 85
32.6%

PDX GrowthCurve Avail
Boolean

High correlation 

Distinct2
Distinct (%)9.1%
Missing0
Missing (%)0.0%
Memory size154.0 B
True
16 
False
ValueCountFrequency (%)
True 16
72.7%
False 6
 
27.3%
2025-07-15T01:50:02.521394image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

MSI Status
Categorical

High correlation 

Distinct2
Distinct (%)9.1%
Missing0
Missing (%)0.0%
Memory size308.0 B
MSI-Stable
16 
Unknown

Length

Max length10
Median length10
Mean length9.1818182
Min length7

Characters and Unicode

Total characters202
Distinct characters14
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMSI-Stable
2nd rowMSI-Stable
3rd rowUnknown
4th rowUnknown
5th rowUnknown

Common Values

ValueCountFrequency (%)
MSI-Stable 16
72.7%
Unknown 6
 
27.3%

Length

2025-07-15T01:50:02.604502image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T01:50:02.669361image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
msi-stable 16
72.7%
unknown 6
 
27.3%

Most occurring characters

ValueCountFrequency (%)
S 32
15.8%
n 18
8.9%
I 16
7.9%
- 16
7.9%
t 16
7.9%
M 16
7.9%
a 16
7.9%
b 16
7.9%
e 16
7.9%
l 16
7.9%
Other values (4) 24
11.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 202
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
S 32
15.8%
n 18
8.9%
I 16
7.9%
- 16
7.9%
t 16
7.9%
M 16
7.9%
a 16
7.9%
b 16
7.9%
e 16
7.9%
l 16
7.9%
Other values (4) 24
11.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 202
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
S 32
15.8%
n 18
8.9%
I 16
7.9%
- 16
7.9%
t 16
7.9%
M 16
7.9%
a 16
7.9%
b 16
7.9%
e 16
7.9%
l 16
7.9%
Other values (4) 24
11.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 202
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
S 32
15.8%
n 18
8.9%
I 16
7.9%
- 16
7.9%
t 16
7.9%
M 16
7.9%
a 16
7.9%
b 16
7.9%
e 16
7.9%
l 16
7.9%
Other values (4) 24
11.9%

CollectionDate
Categorical

High correlation  Uniform 

Distinct10
Distinct (%)45.5%
Missing0
Missing (%)0.0%
Memory size308.0 B
01/2015
07/2019
08/2018
11/2021
04/2018
Other values (5)
10 

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters154
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row11/2021
2nd row11/2021
3rd row08/2018
4th row08/2018
5th row04/2018

Common Values

ValueCountFrequency (%)
01/2015 3
13.6%
07/2019 3
13.6%
08/2018 2
9.1%
11/2021 2
9.1%
04/2018 2
9.1%
07/2016 2
9.1%
03/2016 2
9.1%
04/2017 2
9.1%
07/2017 2
9.1%
01/2016 2
9.1%

Length

2025-07-15T01:50:02.747277image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T01:50:02.841614image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
01/2015 3
13.6%
07/2019 3
13.6%
08/2018 2
9.1%
11/2021 2
9.1%
04/2018 2
9.1%
07/2016 2
9.1%
03/2016 2
9.1%
04/2017 2
9.1%
07/2017 2
9.1%
01/2016 2
9.1%

Most occurring characters

ValueCountFrequency (%)
0 42
27.3%
1 31
20.1%
2 24
15.6%
/ 22
14.3%
7 11
 
7.1%
6 6
 
3.9%
8 6
 
3.9%
4 4
 
2.6%
5 3
 
1.9%
9 3
 
1.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 154
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 42
27.3%
1 31
20.1%
2 24
15.6%
/ 22
14.3%
7 11
 
7.1%
6 6
 
3.9%
8 6
 
3.9%
4 4
 
2.6%
5 3
 
1.9%
9 3
 
1.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 154
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 42
27.3%
1 31
20.1%
2 24
15.6%
/ 22
14.3%
7 11
 
7.1%
6 6
 
3.9%
8 6
 
3.9%
4 4
 
2.6%
5 3
 
1.9%
9 3
 
1.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 154
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 42
27.3%
1 31
20.1%
2 24
15.6%
/ 22
14.3%
7 11
 
7.1%
6 6
 
3.9%
8 6
 
3.9%
4 4
 
2.6%
5 3
 
1.9%
9 3
 
1.9%

Human PathogenTesting Summary
Categorical

High correlation 

Distinct4
Distinct (%)18.2%
Missing0
Missing (%)0.0%
Memory size308.0 B
Negative
15 
Negative
Negative
Negative

Length

Max length10
Median length8
Mean length8.4545455
Min length8

Characters and Unicode

Total characters186
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNegative
2nd rowNegative
3rd rowNegative
4th rowNegative
5th rowNegative

Common Values

ValueCountFrequency (%)
Negative 15
68.2%
Negative 3
 
13.6%
Negative 2
 
9.1%
Negative 2
 
9.1%

Length

2025-07-15T01:50:02.994505image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T01:50:03.074716image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
negative 22
100.0%

Most occurring characters

ValueCountFrequency (%)
e 44
23.7%
N 22
11.8%
g 22
11.8%
a 22
11.8%
t 22
11.8%
i 22
11.8%
v 22
11.8%
5
 
2.7%
5
 
2.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 186
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 44
23.7%
N 22
11.8%
g 22
11.8%
a 22
11.8%
t 22
11.8%
i 22
11.8%
v 22
11.8%
5
 
2.7%
5
 
2.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 186
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 44
23.7%
N 22
11.8%
g 22
11.8%
a 22
11.8%
t 22
11.8%
i 22
11.8%
v 22
11.8%
5
 
2.7%
5
 
2.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 186
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 44
23.7%
N 22
11.8%
g 22
11.8%
a 22
11.8%
t 22
11.8%
i 22
11.8%
v 22
11.8%
5
 
2.7%
5
 
2.7%

Able to ViablyPassage in nude mice
Categorical

High correlation 

Distinct3
Distinct (%)13.6%
Missing0
Missing (%)0.0%
Memory size308.0 B
Unknown
11 
Yes
No

Length

Max length7
Median length5
Mean length4.8636364
Min length2

Characters and Unicode

Total characters107
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowYes
2nd rowYes
3rd rowUnknown
4th rowUnknown
5th rowUnknown

Common Values

ValueCountFrequency (%)
Unknown 11
50.0%
Yes 8
36.4%
No 3
 
13.6%

Length

2025-07-15T01:50:03.169579image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T01:50:03.238009image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
unknown 11
50.0%
yes 8
36.4%
no 3
 
13.6%

Most occurring characters

ValueCountFrequency (%)
n 33
30.8%
o 14
13.1%
U 11
 
10.3%
k 11
 
10.3%
w 11
 
10.3%
Y 8
 
7.5%
e 8
 
7.5%
s 8
 
7.5%
N 3
 
2.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 107
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
n 33
30.8%
o 14
13.1%
U 11
 
10.3%
k 11
 
10.3%
w 11
 
10.3%
Y 8
 
7.5%
e 8
 
7.5%
s 8
 
7.5%
N 3
 
2.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 107
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
n 33
30.8%
o 14
13.1%
U 11
 
10.3%
k 11
 
10.3%
w 11
 
10.3%
Y 8
 
7.5%
e 8
 
7.5%
s 8
 
7.5%
N 3
 
2.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 107
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
n 33
30.8%
o 14
13.1%
U 11
 
10.3%
k 11
 
10.3%
w 11
 
10.3%
Y 8
 
7.5%
e 8
 
7.5%
s 8
 
7.5%
N 3
 
2.8%

Age atSampling
Real number (ℝ)

High correlation 

Distinct9
Distinct (%)40.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean53.045455
Minimum32
Maximum81
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size308.0 B
2025-07-15T01:50:03.325147image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum32
5-th percentile32
Q139
median55.5
Q367
95-th percentile80.4
Maximum81
Range49
Interquartile range (IQR)28

Descriptive statistics

Standard deviation16.220144
Coefficient of variation (CV)0.30577821
Kurtosis-1.2968361
Mean53.045455
Median Absolute Deviation (MAD)15
Skewness0.14859199
Sum1167
Variance263.09307
MonotonicityNot monotonic
2025-07-15T01:50:03.406559image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
39 4
18.2%
67 3
13.6%
32 3
13.6%
59 2
9.1%
36 2
9.1%
60 2
9.1%
52 2
9.1%
69 2
9.1%
81 2
9.1%
ValueCountFrequency (%)
32 3
13.6%
36 2
9.1%
39 4
18.2%
52 2
9.1%
59 2
9.1%
60 2
9.1%
67 3
13.6%
69 2
9.1%
81 2
9.1%
ValueCountFrequency (%)
81 2
9.1%
69 2
9.1%
67 3
13.6%
60 2
9.1%
59 2
9.1%
52 2
9.1%
39 4
18.2%
36 2
9.1%
32 3
13.6%

ModelNotes
Categorical

High correlation  Missing 

Distinct3
Distinct (%)27.3%
Missing11
Missing (%)50.0%
Memory size308.0 B
No PDX Growth
PDX Model Derivation: P0 slow growth; material pooled from Day 300 tumor due to age-related mortality and implanted into P1.
PDX IHC/Path: CD34+ (diffusely), SMA+ (rare cells)

Length

Max length124
Median length14
Mean length50.545455
Min length14

Characters and Unicode

Total characters556
Distinct characters45
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPDX IHC/Path: CD34+ (diffusely), SMA+ (rare cells)
2nd rowPDX IHC/Path: CD34+ (diffusely), SMA+ (rare cells)
3rd rowNo PDX Growth
4th rowNo PDX Growth
5th rowNo PDX Growth

Common Values

ValueCountFrequency (%)
No PDX Growth 6
27.3%
PDX Model Derivation: P0 slow growth; material pooled from Day 300 tumor due to age-related mortality and implanted into P1. 3
 
13.6%
PDX IHC/Path: CD34+ (diffusely), SMA+ (rare cells) 2
 
9.1%
(Missing) 11
50.0%

Length

2025-07-15T01:50:03.501043image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T01:50:03.567391image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
pdx 11
 
12.0%
growth 9
 
9.8%
no 6
 
6.5%
model 3
 
3.3%
derivation 3
 
3.3%
p0 3
 
3.3%
slow 3
 
3.3%
material 3
 
3.3%
pooled 3
 
3.3%
from 3
 
3.3%
Other values (17) 45
48.9%

Most occurring characters

ValueCountFrequency (%)
87
15.6%
o 45
 
8.1%
t 38
 
6.8%
e 33
 
5.9%
r 31
 
5.6%
a 31
 
5.6%
l 27
 
4.9%
i 20
 
3.6%
d 20
 
3.6%
P 19
 
3.4%
Other values (35) 205
36.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 556
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
87
15.6%
o 45
 
8.1%
t 38
 
6.8%
e 33
 
5.9%
r 31
 
5.6%
a 31
 
5.6%
l 27
 
4.9%
i 20
 
3.6%
d 20
 
3.6%
P 19
 
3.4%
Other values (35) 205
36.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 556
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
87
15.6%
o 45
 
8.1%
t 38
 
6.8%
e 33
 
5.9%
r 31
 
5.6%
a 31
 
5.6%
l 27
 
4.9%
i 20
 
3.6%
d 20
 
3.6%
P 19
 
3.4%
Other values (35) 205
36.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 556
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
87
15.6%
o 45
 
8.1%
t 38
 
6.8%
e 33
 
5.9%
r 31
 
5.6%
a 31
 
5.6%
l 27
 
4.9%
i 20
 
3.6%
d 20
 
3.6%
P 19
 
3.4%
Other values (35) 205
36.9%

ProvidedTissue Origin
Categorical

High correlation  Imbalance 

Distinct2
Distinct (%)9.1%
Missing0
Missing (%)0.0%
Memory size308.0 B
Primary
20 
Metastatic Site
 
2

Length

Max length15
Median length7
Mean length7.7272727
Min length7

Characters and Unicode

Total characters170
Distinct characters13
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMetastatic Site
2nd rowMetastatic Site
3rd rowPrimary
4th rowPrimary
5th rowPrimary

Common Values

ValueCountFrequency (%)
Primary 20
90.9%
Metastatic Site 2
 
9.1%

Length

2025-07-15T01:50:03.673552image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T01:50:03.736650image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
primary 20
83.3%
metastatic 2
 
8.3%
site 2
 
8.3%

Most occurring characters

ValueCountFrequency (%)
r 40
23.5%
a 24
14.1%
i 24
14.1%
P 20
11.8%
m 20
11.8%
y 20
11.8%
t 8
 
4.7%
e 4
 
2.4%
M 2
 
1.2%
s 2
 
1.2%
Other values (3) 6
 
3.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 170
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
r 40
23.5%
a 24
14.1%
i 24
14.1%
P 20
11.8%
m 20
11.8%
y 20
11.8%
t 8
 
4.7%
e 4
 
2.4%
M 2
 
1.2%
s 2
 
1.2%
Other values (3) 6
 
3.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 170
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
r 40
23.5%
a 24
14.1%
i 24
14.1%
P 20
11.8%
m 20
11.8%
y 20
11.8%
t 8
 
4.7%
e 4
 
2.4%
M 2
 
1.2%
s 2
 
1.2%
Other values (3) 6
 
3.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 170
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
r 40
23.5%
a 24
14.1%
i 24
14.1%
P 20
11.8%
m 20
11.8%
y 20
11.8%
t 8
 
4.7%
e 4
 
2.4%
M 2
 
1.2%
s 2
 
1.2%
Other values (3) 6
 
3.5%

Interactions

2025-07-15T01:49:56.385559image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-15T01:49:55.589046image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-15T01:49:56.045948image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-15T01:49:56.507871image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-15T01:49:55.755803image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-15T01:49:56.169594image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-15T01:49:56.615430image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-15T01:49:55.877943image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-15T01:49:56.275489image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Correlations

2025-07-15T01:50:03.823673image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Able to ViablyPassage in nude miceAge atDiagnosisAge atSamplingBiologicalSexBiopsySiteCollectionDateDate ofDiagnosisGrade StageInformationHas KnownMetastaticDiseaseHas Smoked100 CigarettesHuman PathogenTesting SummaryMSI StatusModelNotesMolecular andIHC DataOccupationPDX GrowthCurve AvailPatient IDPatientNotesProvidedTissue OriginSelf-ReportedEthnicitySelf-ReportedRaceSpecimen IDStandardizedRegimenTiming
Able to ViablyPassage in nude mice1.0000.6470.7490.3690.7520.7950.7950.5370.5510.2360.3390.5421.0000.8660.7910.5420.7950.8160.2890.2560.5360.7950.2110.000
Age atDiagnosis0.6471.0000.9600.5810.8250.8660.8660.7070.3740.6640.5600.5200.5000.9260.8770.5200.0900.9430.3270.5600.4940.8660.3840.000
Age atSampling0.7490.9601.0000.3810.7850.8660.8660.8710.5880.5680.5600.7150.8660.9260.8770.7150.2230.9430.4720.5600.4940.8660.3630.000
BiologicalSex0.3690.5810.3811.0000.6760.7750.7750.3070.0000.3810.4260.0000.5880.8160.7670.0000.7750.7840.0000.2140.7870.7750.5290.000
BiopsySite0.7520.8250.7850.6761.0000.9610.9610.7630.6660.8270.7080.6400.8661.0000.9530.6400.9610.9430.8060.8060.8270.9610.2870.000
CollectionDate0.7950.8660.8660.7750.9611.0001.0000.8400.7750.7950.8160.7750.8661.0001.0000.7751.0001.0000.7750.7750.7951.0000.2920.000
Date ofDiagnosis0.7950.8660.8660.7750.9611.0001.0000.8400.7750.7950.8160.7750.8661.0001.0000.7751.0001.0000.7750.7750.7951.0000.2920.000
Grade StageInformation0.5370.7070.8710.3070.7630.8400.8401.0000.2780.4350.0000.6750.8660.8660.8450.6750.8400.8940.9220.0000.1560.8400.4190.000
Has KnownMetastaticDisease0.5510.3740.5880.0000.6660.7750.7750.2781.0000.1120.3970.2890.9430.8160.7670.2890.7750.7840.1330.0000.0000.7750.0000.000
Has Smoked100 Cigarettes0.2360.6640.5680.3810.8270.7950.7950.4350.1121.0000.7200.2740.5540.8160.7670.2740.7950.7840.0710.9750.6690.7950.2310.000
Human PathogenTesting Summary0.3390.5600.5600.4260.7080.8160.8160.0000.3970.7201.0000.1841.0000.8160.7910.1840.8160.7840.0000.9490.6690.8160.2740.000
MSI Status0.5420.5200.7150.0000.6400.7750.7750.6750.2890.2740.1841.0000.9430.8160.7670.8790.7750.7840.0000.0000.3540.7750.0000.000
ModelNotes1.0000.5000.8660.5880.8660.8660.8660.8660.9430.5541.0000.9431.0001.0000.8660.9430.8660.8660.9431.0000.9430.8660.4080.000
Molecular andIHC Data0.8660.9260.9260.8161.0001.0001.0000.8660.8160.8160.8160.8161.0001.0001.0000.8161.0001.0000.8161.0000.8161.0000.5350.000
Occupation0.7910.8770.8770.7670.9531.0001.0000.8450.7670.7670.7910.7670.8661.0001.0000.7671.0001.0000.7671.0000.7671.0000.3750.000
PDX GrowthCurve Avail0.5420.5200.7150.0000.6400.7750.7750.6750.2890.2740.1840.8790.9430.8160.7671.0000.7750.7840.0000.0000.3540.7750.0000.000
Patient ID0.7950.0900.2230.7750.9611.0001.0000.8400.7750.7950.8160.7750.8661.0001.0000.7751.0001.0000.7750.7750.7951.0000.2920.000
PatientNotes0.8160.9430.9430.7840.9431.0001.0000.8940.7840.7840.7840.7840.8661.0001.0000.7841.0001.0000.7841.0000.7841.0000.4330.000
ProvidedTissue Origin0.2890.3270.4720.0000.8060.7750.7750.9220.1330.0710.0000.0000.9430.8160.7670.0000.7750.7841.0000.0000.0000.7750.5430.000
Self-ReportedEthnicity0.2560.5600.5600.2140.8060.7750.7750.0000.0000.9750.9490.0001.0001.0001.0000.0000.7751.0000.0001.0000.9750.7750.3480.000
Self-ReportedRace0.5360.4940.4940.7870.8270.7950.7950.1560.0000.6690.6690.3540.9430.8160.7670.3540.7950.7840.0000.9751.0000.7950.3200.000
Specimen ID0.7950.8660.8660.7750.9611.0001.0000.8400.7750.7950.8160.7750.8661.0001.0000.7751.0001.0000.7750.7750.7951.0000.2920.000
StandardizedRegimen0.2110.3840.3630.5290.2870.2920.2920.4190.0000.2310.2740.0000.4080.5350.3750.0000.2920.4330.5430.3480.3200.2921.0000.586
Timing0.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.5861.000

Missing values

2025-07-15T01:49:57.104864image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.
2025-07-15T01:49:57.475514image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2025-07-15T01:49:57.748720image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Patient IDTimingDate RegimenStartedStandardizedRegimenBest ResponseSelf-ReportedRaceSelf-ReportedEthnicityBiologicalSexAge atDiagnosisAdditionalMedicalHistoryDate ofDiagnosisGrade StageInformationHas KnownMetastaticDiseaseHas Smoked100 CigarettesMolecular andIHC DataOccupationPatientNotesSpecimen IDBiopsySitePDX GrowthCurve AvailMSI StatusCollectionDateHuman PathogenTesting SummaryAble to ViablyPassage in nude miceAge atSamplingModelNotesProvidedTissue Origin
0111316Current08/2021RipretinibNaNWhiteNot Hispanic or LatinoMale58Final Pathology: Dx confirmed; mitotic rate is 47/30 HPF; treatment effect present w/central necrosis.07/2020Grade, TNM (Clinical)YesNoIHC: DOG1+, CD117+; S100-, CK AE1/AE3-EngineerTumor Grade/Stage: cM0 (at diagnosis); high grade \nLocation of known metastasis: liver, jejunum, ileum319-RLiver [central]YesMSI-Stable11/2021NegativeYes59PDX IHC/Path: CD34+ (diffusely), SMA+ (rare cells)Metastatic Site
1111316Prior07/2020Imatinib MesylateNaNWhiteNot Hispanic or LatinoMale58Final Pathology: Dx confirmed; mitotic rate is 47/30 HPF; treatment effect present w/central necrosis.07/2020Grade, TNM (Clinical)YesNoIHC: DOG1+, CD117+; S100-, CK AE1/AE3-EngineerTumor Grade/Stage: cM0 (at diagnosis); high grade \nLocation of known metastasis: liver, jejunum, ileum319-RLiver [central]YesMSI-Stable11/2021NegativeYes59PDX IHC/Path: CD34+ (diffusely), SMA+ (rare cells)Metastatic Site
2246632PriorNaNTreatment naiveNaNWhiteNot Hispanic or LatinoMale35Concurrent malignancy - SDHB gene mutation associated paraganglioma11/2017None ProvidedNot ReportedNoc.445C>T (p.Q149*) variant in the SDHB geneinsurance industryPt has concurrent malignancies - GIST and SDHB gene mutation associated paraganglioma235-R1GastricNoUnknown08/2018NegativeUnknown36No PDX GrowthPrimary
3246632CurrentNaNTreatment naiveNaNWhiteNot Hispanic or LatinoMale35Concurrent malignancy - SDHB gene mutation associated paraganglioma11/2017None ProvidedNot ReportedNoc.445C>T (p.Q149*) variant in the SDHB geneinsurance industryPt has concurrent malignancies - GIST and SDHB gene mutation associated paraganglioma235-R1GastricNoUnknown08/2018NegativeUnknown36No PDX GrowthPrimary
4358529Prior01/2018Imatinib MesylatePRWhiteNot Hispanic or LatinoFemale51Final Pathology: Dx confirmed with treatment effect present; 50% viable tumor remains12/2017Grade, TNM (Pathological)Not ReportedYesNaNDepartment MangerTumor Stage/Grade: G1 - 1 low grade; pT3 pN0 pM not applicable109-RStomachNoUnknown04/2018NegativeUnknown52No PDX GrowthPrimary
5358529CurrentNaNNo Current TherapyNaNWhiteNot Hispanic or LatinoFemale51Final Pathology: Dx confirmed with treatment effect present; 50% viable tumor remains12/2017Grade, TNM (Pathological)Not ReportedYesNaNDepartment MangerTumor Stage/Grade: G1 - 1 low grade; pT3 pN0 pM not applicable109-RStomachNoUnknown04/2018NegativeUnknown52No PDX GrowthPrimary
6429767Prior05/2007Imatinib MesylateStable DiseaseBlack or African AmericanNot Hispanic or LatinoFemale53Site of resection included spleen, kidney, distal pancreatectomy and retroperitoneal tumor01/2007None ProvidedYesYesNaNDisabledNaN202-RAbdominal MassYesMSI-Stable07/2016NegativeYes60NaNPrimary
7429767CurrentNaNNo Current TherapyNaNBlack or African AmericanNot Hispanic or LatinoFemale53Site of resection included spleen, kidney, distal pancreatectomy and retroperitoneal tumor01/2007None ProvidedYesYesNaNDisabledNaN202-RAbdominal MassYesMSI-Stable07/2016NegativeYes60NaNPrimary
8627122PriorNaNTreatment naiveNaNWhiteNot Hispanic or LatinoMale39NaN02/2017None ProvidedNot ReportedNoNaNPark ServiceNaN101-RGastric FundusYesMSI-Stable04/2017NegativeYes39NaNPrimary
9627122CurrentNaNTreatment naiveNaNWhiteNot Hispanic or LatinoMale39NaN02/2017None ProvidedNot ReportedNoNaNPark ServiceNaN101-RGastric FundusYesMSI-Stable04/2017NegativeYes39NaNPrimary
Patient IDTimingDate RegimenStartedStandardizedRegimenBest ResponseSelf-ReportedRaceSelf-ReportedEthnicityBiologicalSexAge atDiagnosisAdditionalMedicalHistoryDate ofDiagnosisGrade StageInformationHas KnownMetastaticDiseaseHas Smoked100 CigarettesMolecular andIHC DataOccupationPatientNotesSpecimen IDBiopsySitePDX GrowthCurve AvailMSI StatusCollectionDateHuman PathogenTesting SummaryAble to ViablyPassage in nude miceAge atSamplingModelNotesProvidedTissue Origin
12738633Prior03/2014Imatinib MesylateNaNNot ProvidedNot ProvidedFemale31NaN03/2014None ProvidedNot ReportedNot ProvidedNaNNaNNaN008-RStomach/LiverYesMSI-Stable01/2015Negative \nUnknown32NaNPrimary
13738633CurrentNaNNo Current Therapy<Unknown>Not ProvidedNot ProvidedFemale31NaN03/2014None ProvidedNot ReportedNot ProvidedNaNNaNNaN008-RStomach/LiverYesMSI-Stable01/2015Negative \nUnknown32NaNPrimary
14738633Prior07/2014Sunitinib MalateNon-evaluableNot ProvidedNot ProvidedFemale31NaN03/2014None ProvidedNot ReportedNot ProvidedNaNNaNNaN008-RStomach/LiverYesMSI-Stable01/2015Negative \nUnknown32NaNPrimary
15814656Prior07/2013Imatinib MesylateCRBlack or African AmericanNot Hispanic or LatinoFemale57NaN07/2009Grade, TNMNot ReportedNoBiomarkers: c-KIT mutated, DOG1+Not ProvidedDisease recurrence at distant site\nTumor Grade/Stage: Grade 2, T2N0M1\n196-RStomach [distal]YesMSI-Stable07/2019NegativeNo67PDX Model Derivation: P0 slow growth; material pooled from Day 300 tumor due to age-related mortality and implanted into P1.Primary
16814656Current01/2016Imatinib MesylateNaNBlack or African AmericanNot Hispanic or LatinoFemale57NaN07/2009Grade, TNMNot ReportedNoBiomarkers: c-KIT mutated, DOG1+Not ProvidedDisease recurrence at distant site\nTumor Grade/Stage: Grade 2, T2N0M1\n196-RStomach [distal]YesMSI-Stable07/2019NegativeNo67PDX Model Derivation: P0 slow growth; material pooled from Day 300 tumor due to age-related mortality and implanted into P1.Primary
17814656Prior07/2009Imatinib MesylatePRBlack or African AmericanNot Hispanic or LatinoFemale57NaN07/2009Grade, TNMNot ReportedNoBiomarkers: c-KIT mutated, DOG1+Not ProvidedDisease recurrence at distant site\nTumor Grade/Stage: Grade 2, T2N0M1\n196-RStomach [distal]YesMSI-Stable07/2019NegativeNo67PDX Model Derivation: P0 slow growth; material pooled from Day 300 tumor due to age-related mortality and implanted into P1.Primary
18899375CurrentNaNNo Current TherapyNaNWhiteNot Hispanic or LatinoMale80Prior Malignancy: Prostate Cancer06/2017TNM (Pathological)Not ReportedYesNaNRetired; prior occupation not providedTumor Grade/Stage: pT2NX\n194-RStomachNoUnknown07/2017NegativeUnknown81No PDX GrowthPrimary
19899375Prior01/2012RadiationNaNWhiteNot Hispanic or LatinoMale80Prior Malignancy: Prostate Cancer06/2017TNM (Pathological)Not ReportedYesNaNRetired; prior occupation not providedTumor Grade/Stage: pT2NX\n194-RStomachNoUnknown07/2017NegativeUnknown81No PDX GrowthPrimary
20949853CurrentNaNTreatment naiveNaNWhiteNot Hispanic or LatinoMale39Final Pathology: Stomach - Dx confirmed, predominantly spindle-cell type. Mitotic count: 80 per 5mm2. Liver: Metastatic gastrointestinal stromal sarcoma.10/2015None ProvidedYesYesIHC: DOG1+, CD117+, and PDGFR+ consistent with a gastrointestinal stromal tumor (GIST).Plastics factory workerLocation of know metastases: liver013-RStomachYesMSI-Stable01/2016Negative\nYes39NaNPrimary
21949853PriorNaNTreatment naiveNaNWhiteNot Hispanic or LatinoMale39Final Pathology: Stomach - Dx confirmed, predominantly spindle-cell type. Mitotic count: 80 per 5mm2. Liver: Metastatic gastrointestinal stromal sarcoma.10/2015None ProvidedYesYesIHC: DOG1+, CD117+, and PDGFR+ consistent with a gastrointestinal stromal tumor (GIST).Plastics factory workerLocation of know metastases: liver013-RStomachYesMSI-Stable01/2016Negative\nYes39NaNPrimary